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While seemingly straightforward in principle, the reliable estimation of rate constants is seldom 
easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause 
obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the 
reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of 
an appropriate reactive flux function. However, when applied to real data from single-molecule ex- 
periments or molecular dynamics simulations, the rate can sometimes be difficult to extract due to 
the numerical differentiation of a noisy empirical correlation function or difficulty in locating the 
plateau region at low sampling frequencies. We present a modified version of this theory which 
does not require numerical derivatives, allowing rate constants to be robustly estimated from the 
time-correlation function directly. We compare these approaches using single-molecule force spec- 
troscopy measurements of an RNA hairpin. 

Section: Kinetics, Spectroscopy or Statistical Mechanics, Thermodynamics, Medium Effects 



17 The observed dynamics of complex molecular systems 
is such as biomolecules often suggest a simple underlying be- 

19 havior. Much of chemistry and biophysics revolves around 

20 attempting to identify simple models that adequately de- 

21 scribe the observed complex dynamics of these systems. 

22 In many cases, stochastic conformational dynamics can be 

23 modeled to good accuracy using simple first-order phe- 

24 nomenological rate theory, a topic that has been extensively 

25 studied theoretically [1,2]. However, when it is necessary to 

26 estimate rates from trajectories generated by computer sim- 

27 ulation or observed in single-molecule experiments, numer- 

28 ous pitfalls can frustrate the ability to extract robust, reliable, 

29 and accurate estimates of rate constants using seemingly ob- 

30 vious approaches. Here, we demonstrate these pitfalls for 

31 naive approaches to rate estimation in single-molecule force 

32 spectroscopy for an RNA hairpin, and show how reactive flux 

33 theory [3-7] and a novel but related variation can provide 

34 robustness to sampling frequency, finite statistics, and mea- 

35 surement noise. 

36 Rate theory. Suppose we have a population of N noninter- 

37 acting molecules in solution that can occupy one of two con- 
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formational states, denoted A and B. Without loss of gener- 
ality, we assume we are given a trajectory of some order pa- 
rameter x(t) that allows us to define associated occupation 
functions h,A{t) and hB{t) for states A and B, such that 



h A (t) = 



1 if x{t) < x t 
if x{t) > x* 



h B (t) = 



if x(t) < x i 

1 if x(t) > x i 



If there is a separation of timescales between the short re- 
laxation time within the conformational states and the long 
time the system must wait, on average, in one conforma- 
tional state before undergoing a transition to another state, 
the asymptotic relaxation behavior of an initial population of 
Na(0) molecules in conformation A and Nb (0) molecules in 
conformation B can be described by a simple linear rate law: 



-k A ^B N A (t) + k B ^ A N B (t) 



(1) 



49 where Iza^b and ks^A are microscopic rate constants. In 

50 terms of time-dependent expectations over trajectories initi- 

51 ated from some initial nonequilibrium state, Eq. 1 is equiva- 

52 lent to 



d_ 



<M*)>,. 



-k A ^ B (h A (t)) ne + k B ^A (h B (t)) ne (2) 



53 where (h A (t)} ne denotes the nonequilibrium probability of 

54 finding a given molecule in conformation A at time t given 

55 that the fraction of molecules that were initially in conforma- 

56 tionyl was (h A (0)) ne = N A (0)/N. 

57 Were Eq. 2 to govern dynamics at all times, the expected 

58 fraction of molecules in conformation A as a function of time 



2 



59 would be given by an exponential decay function, 

(M*))™ = (M + [<M0)> ne - <A A >] e - 



(3) 



where Qia) denotes the standard equilibrium expectation of 
Iia, giving the equilibrium fraction of molecules in confor- 
mation A. The quantity k = kA-^a + ks^A denotes the phe- 
nomenological rate constant because it is the effective rate 
that dominates the observed exponential asymptotic relax- 
ation decay behavior. It is the estimation of this quantity k, 
that will be our primary concern. 

If the system were purely two-state, such that Eq. 3 held for 
all time t > 0, a number of naive approaches to estimation 
of the phenomenological rate constant k from observed tra- 
jectory data would yield useful rate estimates. For example, 
given an observed trajectory x(t), we could simply compute 
the number of times n c the dividing surface x f was crossed in 
either direction in total trajectory time i bs, estimating the k 
rate by, 




60 P(x) 



FIG. 1 . Force trace of p5ab RNA hairpin in a stationary op- 
tical trap. A 60-second observation is shown, where the force 
history x(t) recorded at 50 kHz and subsampled to 1 kHz is 
plotted. A histogram of the observed force values is shown as 
P(x) to the right. The red line indicates the optimal dividing 



surface for rate calculations, : 



12.57 pN. 



t-ohs 



(4) 



75 Alternatively we could partition the trajectory into segments 

76 where the system remains on one side of x 1 in each segment, 

77 and estimate the mean lifetime r of these segments, from 

78 which the rate k is estimated by, 



fciif, 



(5) 



Both approaches will yield rate estimates that converge to the 
true rate k as t b S — > oo when x provides a perfect reaction co- 
ordinate for a perfectly two-state system, in that a;* correctly 
divides the two conformations states that interconvert with 
first- order kinetics. 

However, when considering trajectories obtained from 
computer simulations or single-molecule experiments with 
imperfect dividing surfaces, these naive approaches can lead 
to substantially erroneous estimates. First, we do not ex- 
pect Eq. 3 to hold for short times t < r mo i, where r mo i is the 
timescale associated with relaxation processes that damp out 
recrossings that occur due to imperfect definition of the sepa- 
ratrix between the reactant and product states [3, 4, 7]. An al- 
ternative view of this is that the observed coordinate x might 
function as a good order parameter, in that it allows the con- 
formational states to be well-resolved at extreme values of x, 
but a poor reaction coordinate, in that both conformational 
states are populated in some region near the optimal dividing 
surface a;* [8-10] (which is optimal in that it minimizes the 
rate estimate in a variational sense [11]). The rate estimates 
from Eqs. 4 and 5 will therefore overestimate the number of 
crossings or underestimate the state lifetimes, instead con- 
verging to the transition state theory rate estimate fc T sT that 
gives the instantaneous flux across the dividing surface, 



&TST 



d (h A (0)h B (t)) 



dt 



(6) 



and hence overestimating the true rate k. Additionally, if the 
observed trajectories are not continuous, but instead con- 
sist of discrete observations made with a sampling resolu- 
tion At, additional issues develop. As the sampling interval 
At increases, some crossing of the dividing surface will 
be missed, and the perceived lifetimes of states will be in- 
creased, having the opposite effect of a poor reaction coor- 
dinate in diminishing the rate estimates of Eqs. 4 and 5. As a 



result, it can be difficult to predict whether the overall result 
is an underestimate or overestimate of the true rate k. An ex- 
ample illustrating these effects for a model system where the 
true rate is known is given in the Supplementary Material. 

Application of naive rate estimators to single-molecule 
data. To understand how these pathologies can affect real 
measurements, we examined the behavior of the p5ab RNA 
hairpin in an optical trap under passive conditions. This hair- 
pin has been the subject of previous single-molecule force 
spectroscopy studies [12-14], and exhibits apparent two- 
state kinetics as the hairpin folds and unfolds under an ex- 
ternal biasing force. The force trace x(t) is shown in Fig. 1, 
and reports the instantaneous force on the optically trapped 
bead along the bead-bead axis; for a harmonic trap, this force 
is linearly proportional to the displacement of the bead from 
the center of the trap, and hence the bead-to-bead exten- 
sion. As the hairpin folds, the bead-to-bead distance con- 
tracts, increasing the applied force as the polystyrene bead 
29 conjugated to the end of the polymer moves away from the 
center of the optical trap. At the stationary trap position used 
for data collection, the hairpin makes many transitions be- 
tween the two states resolvable from the measured force in 
the 60-second trajectory, populating each state nearly equally 
(Fig. 1). Data was collected at 50 kHz using a dual-beam 
counter-propagating optical trap [15, 16], a high sampling 
rate far above the corner frequency for bead response under 
these conditions, as previously published [14]. To examine 
the dependence on sampling interval At, the data was also 
subsampled to 1 kHz, a frequency found to be below the cor- 
ner frequency of the bead, such that the bead velocity has 
decorrelated between sequential observations due to hydro- 
dynamic interactions [14]. 

The rate constant was estimated using the naive cross- 
ing rate fc crosS m gs (Eq. 4) as a function of the dividing surface 
choice k*, and plotted in Fig. 2 (middle upper and lower pan- 
els, red lines). Two issues are quickly discerned: First, near 
the optimal choice of dividing surface (a;* ~ 12.57 pN), the 



estimated rate k c 



differs greatly depending on whether 



the 1 kHz data (red dashed line) or 50 kHz data (red solid line) 
were used to compute the rate estimate, yielding disparate 
estimates of 45.4 s _1 and 552 s _1 , respectively. Second, as 
the dividing surface is perturbed slightly, the rate estimate 
for either sampling rate changes rapidly. Both properties are 
highly undesirable, as practical estimators of the rate should 
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FIG. 2. Dependence of rate estimates on dividing surface. 

Top: Histogram of observed forces near transition region be- 
tween conformational states. Upper middle: Rate estimate 
from crossing rate fc crosS m g (red lines), reactive flux rate esti- 
mated £)rf (t) near plateau time of r = 3 ms (green lines) , and 
implied rate fej m (r) evaluated at r = 60 ms (black lines), es- 
timated from 1 kHz data (dashed lines) or 50 kHz data (solid 
lines) as a function of dividing surface x x choice. Lower mid- 
dle: Same as upper middle, but close-up view of rate esti- 
mates below200 s -1 . Bottom: Estimates of equilibrium prob- 
abilities 7T.4 and 7tb estimated from 1 kHz data as a function 
of dividing surface placement a:*. (Estimates of tta and 7r s 
from 50 kHz data are visually indistinguishable from 1 kHz 
estimates.) 



yield results insensitive to the sampling rate and exact place- 
ment of dividing surface. 

Reactive flux theory. To deal with the problems inherent in 
using an imperfect reaction coordinate or dividing surface, 
Chandler (and subsequent workers) demonstrated how the 
phenomenological rate could be recovered through the use of 
time-correlation functions, proposing the reactive flux fc RF (t) 
be computed [3-6] to estimate k, 



d (5h A (0) Sh A (t)) 
dt (5h' A ) 



(7) 



where 8h A (t) = hA(i) — (/ia) is the instantaneous deviation 
from the equilibrium population for some trajectory x(t). 
The reactive flux function &rf (t) measures the flux across the 
boundary between A and B that is reactive, in the sense that 
the system has crossed a dividing surface placed between A 
and B at time zero and is located on the product side of the 
boundary at time t. The reactive flux is bounded from above 
by the transition state theory rate estimate fcrsT, the instan- 
taneous flux across the boundary, because recrossings back 
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FIG. 3. Reactive flux and implied rates from p5ab hair- 
pin single-molecule force trajectory. The implied rate fei m (t) 
(red) and reactive flux rate correlation function knp (t) (black) 
are computed for the optimal dividing surface a;* « 12.57 pN 
for 1 kHz (left) and 50 kHz (right). Close-up views compare 
the scatter in the rate estimates in the plateau region (3-4 ms) 
and long correlation times (59-60 ms) for 50 kHz data {right 
insets) . 



72 to the reactant state will diminish the reactive flux; fc RP (£) be- 

73 comes identical to &tst as t — > + [3] . At t larger than some 

74 T mo i, thermalization processes will cause recrossings to die 

75 out, and the molecule will be captured either in its reactant 

76 or product states and remain there for a long time. As a re- 

77 suit, the asymptotic rate constant (whose existence requires 

78 the presupposed separation of timescales) is only obtained at 

79 r mo i < t <S T rxn , where fcRp(f) reaches a plateau value. fcRF(i) 
subsequently decays to zero at t » r rxn with a time constant 

si of T rxn = l/fc [3, 4]. Subsequent work extends these concepts 

82 to the case of multiple conformational states [5, 6] . 

83 Application of reactive flux theory to single- molecule data. 

84 We computed the reactive flux fc R F(t) from this force trajec- 

85 tory for both 1 kHz and 50 kHz sampling frequencies, us- 
ee ing one-sided finite-differences to estimate the derivative in 
87 Eq. 7. When estimated from 50 kHz data (Fig. 3, right), the re- 
sa active flux fc R ,F (t) smoothly stabilizes to ~ 36 s _1 after a tran- 

89 sient time of r mo i « 3 ms. This is the plateau time t = 3 ms 

90 for which &RF(i) ~ k, with t mo \ < t <g; r rxn , where r rxn w 28 

91 ms. When the reactive flux fcRF(t) is evaluated at t = 3 ms 

92 for various choices of dividing surface in this transition re- 

93 gion (Fig. 2, middle panels), the reactive flux rate is indeed 

94 insensitive to the choice of x i at both 50 kHz (solid green 

95 line) and 1 kHz (dashed green line) sampling frequencies. In 

96 this respect, the reactive flux approach provides a much more 

97 robust way to estimating rates than the naive estimators of 

98 Eqs. 4 and 5. To our knowledge, this represents the first time 

99 this theory has been applied to single-molecule experiments. 

200 Implied rate theory. The universal application of the re- 

201 active flux approach to rate estimation from single-molecule 

202 and computer experiments still presents a number of practi- 

203 cal difficulties. Because the correlation function is estimated 

204 from a trajectory x(t) sampled with discrete time resolution 

205 At, computation of the time derivative in Eq. 7 by finite- 

206 difference methods can often introduce unacceptably large 

207 amount of noise in the resulting estimate of A;rf (t) (Fig. 3, 

208 right inset, black dots). Alternatively, the correlation function 

209 {Sh,A(0)ShA(t)) could be smoothed by fitting a polynomial to 

210 produce a continuous estimate of the derivative, but this in- 

211 troduces a bias that is difficult to quantify. Additionally, if the 
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212 reaction timescale r rxn is not very long compared to the ob- 

213 servation interval At, then the plateau region where /crf (t) is 
2« identical to the rate may be small and difficult to detect be- 

215 fore /crf (i) decays to zero. This can be seen in the reactive 

216 flux feRp(i) estimated from the 1 kHz data (Fig. 3, left, black 

217 dots), where the plateau region near 3-4 ms is relatively nar- 

218 row and difficult to detect, and the &rf (t) falls (decaying as 

219 ke~ kt ) as t reaches times comparable to r rxn . Lastly, while 

220 alternative expressions to Eq. 7 exist where the velocity nor- 

221 mal to the separatrix at the time of barrier crossing is utilized 

222 instead of a time derivative of the empirical correlation func- 

223 tion [3, 4], it is difficult to compute this velocity for complex 

224 dividing surfaces in computer simulations, and difficult to 

225 measure experimentally in single-molecule experiments. 

226 We propose an alternative approach, similar in spirit to re- 

227 active flux but more closely related to the rate theories used 

228 in constructing Markov state models from molecular simu- 

229 lations [17-20], that avoids the need to compute the time 

230 derivative of the correlation function in Eq. 7. Instead, we 

231 estimate the rate h m (t) implied by the state-to-state transi- 

232 tion probabilities observed for a given observation interval 

233 t — referring to this quantity as the implied rate constant. As 

234 with the reactive flux, for times t where r mo \ < t <g r rxn , the 

235 phenomenolgical rate constant (if it exists, by virtue of a sep- 

236 aration of timescales) is recovered by fci m (t) , but our modified 

237 estimator provides a much larger plateau for times t > r mo i 

238 where a usable rate estimate can be extracted. 

239 As before, if a separation of timescales exists, relaxation be- 

240 havior for times t > r mo i is defined in terms of first order rate 

241 equations (Eq. 2), here recast in matrix form, 



|p(*) = Kp(*) 



(8) 



242 where p = [p A {t) p B {t)] T , p A {t) = (h A (t)) ne and p B (t) = 

243 {h.B(t)) ne denote the nonequilibrium occupation probabili- 

244 ties of states A and B at time t, and K is the matrix of rate 

245 constants 



K 



~k A ^B 
k A ^B 



k B - 
-k B - 



(9) 



246 The eigenvalues of K are Ai = 0, reflecting conservation of 

247 probability mass, and A2 = — (k A ^B + k B —*A) = —k, which 
24a governs the recovery toward equilibrium populations tya and 

249 -kb at the phenomenological relaxation rate k. 

250 The solution to Eq. 8 (corresponding to Eq. 3) is given by 



Pit) 



; p(0) = T(i)p(0) 



(10) 



251 where e A = J2^=o A™ /n\ is the formal matrix exponential 

252 and T(t) can be identified as the column- stochastic transi- 

253 tion probability matrix whose elements Tjt (t) give the condi- 

254 tional probability of observing the system in conformation j 

255 at time t given that it was initially in conformation i at time 0. 

256 The elements of T(t) for a given observation interval t are 

257 conveniently given in terms of the time-correlation function, 



T Si (t) = 



<Mo)M*)> 



(ii) 



263 a one-to-one correspondence between T(t) and the rate ma- 

264 trix 'Kim (t) it implies for any t, 



T(t) = 



K, m (i)( 



<^ K im (t)=t l logT(t), 



(12) 



265 where the logarithm denotes the matrix logarithm. Assuming 

266 a phenomenological rate constant k exists, all K im (t) w K for 

267 t > T mo h 

268 Because of their relationship through the exponential 

269 (Eq. 12), T(t) and K. im (t) share the same eigenvectors Ufc, 

270 and their respective eigenvalues /j,k(t) and Aj.(i) are simply 

271 related [21], 



Mfe(i) = e Afe(t)t . 



(13) 



The implied rate constant h m (t) for observation time t can be 
obtained from the nonzero eigenvalue of K irn (t), 



kim(t) = -A 2 (t) = -t 1 In fi 2 (t) 



(14) 



where ^(t) = 1 — (T A B(t) + T B A(t))- Using Eq. 11 and some 
algebra, we find \x 2 (t) can be written, 



{Sh A (0)Sh A (t)) 
(Shi) ' 



(15) 



which is simply the normalized fluctuation autocorrelation 
function for the indicator function h A for state A (or, equiv- 
alently for state B) . fi 2 (t) therefore takes the value of unity at 
t = and decays to zero at large t. 

Combining Eqs. 14 and 15 gives the expression for the im- 
plied rate estimate fcj m (t) of the phenomenological rate fe, 



fam(t) 



-t _1 ln 



(8hA(0)Sh A (t)) 
(Sh 2 A ) ' 



(16) 



282 which is the main result of this paper. 

283 In the limit t — > + , h m (t) reduces to the transition state 

284 



theory estimate /ctst, 



lim k- v 

t->o+ 



»(*) 



d {6h A (0)8h A (t)) 



dt 



(Sh A ) 



(17) 



285 just as for the reactive flux rate (Eq. 7) [3, 4]. Similarly, the 

286 true phenomenological rate k is given by the long-time limit 

287 Offcjm(t): 



k — lim fcim(t) = lim —t 



_! {Sh A (0)Sh A (t)} 
{61%) 



(18) 



288 However, when estimating the phenomenological rate 

289 through this expression, evaluation of the correlation func- 

290 tion should be for some t <C r rxn = k' 1 , as the statistical 

291 error in the estimate of fci m (t) grows with t (see Appendix). 

292 When there is a separation of timescales such that r mo i <g; 

293 T rxn , such that a phenomenological rate exists, we can see 

294 that fci m (i) and fcRF(i) are expected to provide similar esti- 

295 mates in the regime r mo i < t <C r rxn . We note Eq. 16 can 

296 be rearranged to yield a correlation function 



(Sh A (0)Sh A (t)) 
(8h\) 



-fc im (t) t 



(19) 



258 where the stationarity and time-reversal symmetry of phys- 

259 ical systems at equilibrium ensures that (hi(0) hj(t)) = 

260 (hj (0) hi (t)), and ni is the equilibrium probability of state i. 

261 For t > r mo i, we have T(t) w e Kt for a constant matrix K, 

262 but this will not hold for t < r mo i- Instead, we can establish 



297 By the definition of reactive flux (Eq. 7), we can write knF(t) 

298 in terms of h m (t) as, 



d (5h A (0)5h A (t)) _ d 



dt 



(Shi) 



dt 



-fc im (t) t 



k im (t) +t — k im (t) 



(20) 
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299 Whent » r mo i, then fci m (t) « fc, and we have /crf (t) w ke~ kt . 

300 Application of implied rate theory to single-molecule data. 

301 To illustrate the estimation of the phenomenological rate k 

302 using the implied timescale h lul (t), we computed it for the 

303 p5ab hairpin force trajectory described above. At the 50 kHz 

304 sampling rate (Fig. 3, right), the rate estimates are almost 

305 identical to those from k«e{€) for a broad range of times 

306 where t > r mo i, though there is much less noise in the fcj m (i) 

307 rate estimate than in ksF(t) (Fig. 3, right inset). At the 1 kHz 

308 sampling rate (Fig. 3, left), however, the rate estimate from 

309 feim(t) remains stable over several times r rxn , even though 

310 the k RF (t) has already decayed from the plateau region. The 
3n implied rate estimate, h m (t), therefore appears to provide a 

312 more robust estimate of the phenomenological rate under a 

313 variety of conditions. 

3« This robustness also carries over to an insensitivity to the 

315 placement of dividing surface i f , the problem reactive flux 

316 theory was originally envisioned to solve. Using an observa- 

317 tion time of r = 60 ms, the implied rate estimate fem (t) varies 

318 much less than the naive rate estimates over a large range of 

319 dividing surface choices (Fig. 2, middle panels, black dashed 

320 and solid lines). 

321 Microscopic rate constants. To obtain individual micro- 

322 scopic rates /ca^b and k B ^A, we recall that the phenomeno- 

323 logical rate k represents the sum of the forward and backward 

324 rates, 

k = k A ^ B + k B ^ A (21) 

325 as well as the fact that the flux across the dividing surface 

326 must be balanced at equilibrium, 

KAkA^>B = 7Tsfcs^A (22) 

327 which allows us to deduce that the individual rates are simply 

kA^B = 7Tfl fc ; k B ^A = 7T.4 k (23) 

328 The equilibrium probabilities %a and ns can be simply esti- 

329 mated by the fraction of samples observed on each side of the 

330 dividing surface x t , such that tta ~ (Aa) and^s « (/is). For 

331 the RNA hairpin, estimates of tta and n b are shown as a func- 

332 tion of dividing surface placement in Fig. 2 (bottom panel). 

333 As both the equilibrium probability and phenomenological 



334 rate estimates are sensitive to the choice of dividing surface, 

335 the microscopic rates kA~+B and ks->A will be more sensitive 

336 to the dividing surface placement than either property alone. 

337 The sensitivity of rates to the choice of dividing surface has 

338 some important implications. While thermodynamic quanti- 

339 ties (e.g. the free energy difference between two macrostates) 

340 are rather insensitive to the choice of dividing surface (as 

341 slight variation in 7ta and ttb is suppressed by the logarithm 

342 in AG = — fcsTln(7rA/7i"s)), rates (and other kinetic proper- 

343 ties such as commitment probabilities [8]) typically have ex- 

344 ponential weighting working in the opposite direction, mak- 

345 ing the definition of the surface particularly important. A key 

346 implication of this sensitivity is the challenge of comparing 

347 theory and experiment in kinetics — both must agree on the 

348 definition of the dividing surface in order to avoid confound- 

349 ing the comparison. This is also of course an issue with even 

350 comparing different experiments. While this problem is un- 

351 avoidable, our hope is that an approach which directly con- 

352 siders a detailed state decomposition [20, 22] will help further 

353 aid in the connection between theory and experiment. 
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